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ABSTRACT 

We present the pseudo-particle multipole method (P 2 M 2 ), a new method to handle multipole 
expansion in fast multipole method and treecode. This method uses a small number of 
pseudo-particles to express multipole expansion. With this method, the implementation of FMM 
and treecode with high-order multipole terms is greatly simplified. We applied P 2 M 2 to treecode 
and combined it with special-purpose computer GRAPE. Extensive tests on the accuracy and 
calculation cost demonstrate that the new method is quite attractive. 



Introduction 



In this paper, we describe the pseudo-particle multipole method (P M , Makino 1998), a new 



method to express multipole expansion in fast multipole method(FMM, |Greengard and Rokhlin 1987| ) and 
treecode( Barnes and Hut 1986| ). The basic idea of P 2 M 2 is to use a small number of pseudo-particles to 
express the multipole expansion. In other words, this method approximates the potential field of physical 
particles by the field generated by a small number of pseudo-particles. The distribution of pseudo-particles 
is determined so that it correctly describes the coefficients of multipole expansion. 

P 2 M 2 offers two advantages over standard FMM which uses the multipole expansion directly. One 
advantage is its simplicity. As will be seen, the translation formulae used in P 2 M 2 are much simpler than 
their counterpart in standard FMM. Although the calculation cost is roughly the same for the same level of 
the accuracy, the simplicity implies that performance tuning and parallelization are easier. 



Another advantage is that P 2 M 2 can take full advantage of special-purpose computers(GRAPE, jMakino 
and Taiji 1998; Sugimoto et al. 1990). GRAPE is a pipelined processor specialized to direct force calculation 
between particles. It offers the price-performance 100-1000 times better than that of general-purpose 



computers, for direct force calculation. Though treecode has been used on GRAPE ( Athanassoula et al 



1998; Fukushigc et al. 1991; Makino 1991), it was difficult to go to high accuracy since only monopole can 



be calculated on GRAPE. With P 2 M 2 we can evaluate high-order terms using GRAPE, since these terms 
are expressed by distribution of pseudo-particles. 

This paper is organized as follows. In section || we briefly describe the treecode and the FMM. In 
section || we give the description of Anderson's method ( Anderson 1992 ), to which our method is closely 
related. In section || we describe the mathematics of P 2 M 2 . In section || we present the result of numerical 
tests on the accuracy and the calculation cost for our implementation of treecode with P 2 M 2 . In section 
H we discuss the implementation of treecode with P 2 M 2 on GRAPE. In section we compare P 2 M 2 with 
Anderson's method. In section H we summarize this paper. 
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Treecode and FMM 



Here we give brief description of treecode and FMM. The treecode is described in section 2.1 and the 
FMM is described in section 2.2. 



2.1. Treecode 

In treecode, the forces from a group of distant particles are approximated by multipole expansions. 
Hierarchical tree structure is used for grouping of the particles. 



Here we summarize the calculation procedure following Hernquist 1987; Hcrnquist and Katz 1989 
First we construct an oct-trce structure by hierarchical subdivision of the space. The division procedure 
starts from the root node (root cell) of the tree which corresponds to a cube covering the entire system. The 
procedure is repeated until all leaf cells contain only one or zero particles. 

In the next step we calculate the multipole expansions for all non-leaf cells. The calculation begins from 
the parents of the leaf cells and continued to the root cells ascending the tree structure. For a cell whose 
children are all leaves, the multipole expansion at its center is directly calculated from the distribution 
of the particles (leaves) in it. The expansion of a higher level cell is calculated from the expansions of its 
children. For each child cell, the center of the expansion is shifted to the center of the parent(M2M shift). 
All shifted expansions are then summed up at the center of the parent cell. Note that in most of existing 
implementation of treecode only up to quadrupole term is retained, and the center of mass is used as the 
center of expansion. 

Then we calculate the total force on each particle. Starting from the root cell, we recursively traverse 



the tree structure collecting the force from cells( Barnes and Hut 1986 ). We examine whether the cell in 



question is well separated from the particle or not. If the cell is well separated, the multipole expansion of 
the cell is evaluated at the position of the particle and added to the total force on the particle. In the case 
of a leaf cell, the force from the particle in it is used instead of the multipole expansion. If the cell is not 
well separated, we descend the tree to resolve the current cell into child cells, an then recursively examine 
each child in the same way. The condition that a cell is well separated is expressed as l/d < 9. Here I is the 
side length of the cell, d is the distance between the cell and the particle, and 9 is the opening angle that 
controls the accuracy. Leaf cells are always considered to be well separated. 

The calculation cost of treecode is 0(N log N), since the particle sees larger cell as distance becomes 
larger. 

Treecode is widely used in astrophysics, in particular where accuracy requirement is modest. Most of 
existing implementations of treecode use only up to quadrupole moment and calculation cost rises quickly 



when high accuracy is required. The implementation detail is given in Hcrnquist 1987; Hcrnquist and Katz 
198G. For the implementation on distributed-memory parallel computers, see fBalmon et al. 1994 . 



2.2. FMM 

Figure [[] shows the conceptual difference between treecode and FMM. In treecode, the forces from a 
cell to different particles are evaluated independently. In FMM, on the other hand, to calculate forces from 
cell A to particles in cell B, we first calculate the spherical harmonics expansion(local expansion) of the 
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potential field at the center of cell B and then evaluate that expansion at the position of particles in cell B. 



FMM was first presented in Grcengard and Rokhlin 1987 for two dimensional case and was extended 



Schmidt and Lee 1991 



to three dimension in Greengard and Rokhlin 1988. The implementation detail for the three dimensional 



case is given m |Board et al. 1994| ; preengard and Rokhlin 1988; ; |Hu et al. 1996| ; |Pfalzncr and Gibbon 199^ ; 
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Fig. 1. — Approximation used in treecode(left) and FMM (right). 



In the following we describe the non-adaptive version of FMM for three dimensional case. First we 
construct an oct-tree structure by hierarchical subdivision of the space. The division procedure starts from 
the root cell at refinement level R — 0, which covers the entire system. Here we define the refinement level 
R as the depth of the tree. We repeat the procedure until a given refinement level i? max is reached. The 
level i? max is chosen so that the average number of particles in one leaf cell roughly equals the prescribed 
number which is determined to optimize the calculation speed. 

In the next step, we calculate the multipole expansions for all cells. This part is the same as that for 



treecode described in section 2.1. 



Then, for each cell, we calculate the local expansion due to its interaction cells. Figure |2| shows the 
relation between a cell and its interaction cells. Interaction cells of a cell are defined as the children of its 
parent's neighbor cells which are not its own neighbors. Here, neighbor cells are the cells at the same level 
which are in contact with the cell. The contribution from the interaction cells are calculated by converting 
their multipole expansion to the local expansions at the center of the objective cell(M2L conversion), and 
then summing them up. 

In the next step, we add up the local expansions at different levels to obtain the total potential field at 
the leaf cells. We start calculation at R — 3. For all cells in R = 3, we shift the center of the local expansion 
of its parent (L2L shift), and then add it to the local expansion of the cell. By this procedure, all cells in 
R = 3 will have the local expansion of the total potential field except for the contribution of the neighbor 
cells. By repeating this procedure for all levels, we obtain the potential field for all leaf cells. 

Finally, we calculate the total force on each particle. The total force is calculated as a sum of the 
distant and the neighbor contributions. The distant part is calculated by evaluating the local expansion 
of the leaf cell at the position of the particle. The neighbor part is directly calculated by evaluating the 
particle-particle forces. 

In the following, we summarize the mathematics used in FMM. FMM requires two types of expansion 
of the potential and three transformations of them. One of the two expansions is the multipole expansion 
and the other is the local expansion. The multipole expansion of the potential $(r) up to p-th. order is 
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Fig. 2. — Relation between interaction cells (shaded) and neighbor cells of a hatched cell. 



expressed as 

p m=l m 

*w = E E ifeinM) a) 

in spherical coordinates r{r,6,4>). Here Y" ; m (#, 0) is the spherical harmonics and a™ are the expansion 
coefficients. In order to approximate the potential field due to the distribution of particles, the coefficients 
should satisfy 

4 N 

"r = ^E m ^ m *(^)< ( 2 ) 

i=l 

where N is the number of particles to be approximated, rm and (rj, 9i,(f>i) are the masses and positions of 
the particles, and * denotes the complex conjugate. The local expansion *S?(r) up to p-th order is given by 

p m—l 

*(r)=4^^/? 1 Vy i ™WA (3) 

1=0 m=-l 

where /3™ is the expansion coefficients. 

The three transformation required for FMM are the M2M shift, the M2L conversion, and the L2L shift. 
These transformation are expressed as follows: 



a„ = 



p m=l 

E E T l^M a T ■ M2M, (4) 

Z=0 m =-l 
p m=l 

W = E E T v™M a T ■ M2L ' (5) 

1=0 m=-l 
p m=l 

&v = E E T vL-, m dT ■ L2L. (6) 

1=0 m=-l 
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Using 



(2^ + l) 1/2 



aim ~ { 1} m [47r(J + m)!(J-m)!]i/2' (?) 



the transformation matrices are expressed as 



t mm = , (-rt)''- 1 ^;- m> (g t ,^)oi>- i , m -- m a> TO (2i' + i) 

Z ' m '' im 77 (2/ + l)[2(Z'-l) + l]oj/ TO / ' 1 ' 

rpLM , (-i)' +m ^+r m) *(^K^w rql 

\f + < +1 (2Z + l)(2Z' + l)a w _ m ' W 

rriLL , r t Y l-l> {Ot,<i>t)ai> m >ai-i>, m - m > , , 

z ' m '' im ^ a, m (2/' + l)[2(i -/') + !] ' 

Here, (rt,0t, <t>t) is the position of the new origin relative to the old one. 

The calculation cost of FMM is O(N). Therefore the scaling of FMM is better than that of treecode. 
However, at least in three dimension, comparisons indicate that treecode is faster for realistic number of 



particles (Blackston and Sucl 1997). 



Anderson's Method 



Anderson ( Anderson 1992Q proposed a simple formulation of FMM based on Poisson's formula. 
Poisson's formula gives the solution of the boundary value problem of the Laplace equation. When potential 
on the surface of a sphere of radius a is given, the potential <E> at position r is expressed as 



J S — — n 



(11) 



for r > a, and 



p OO 



(12) 



for r < a. Here $(as) is the given potential on the sphere surface. The range S of the integration covers 
the surface of a unit sphere centered at the origin. The function P n denotes the n-th Legendre polynomial. 

In Anderson's method the value of <!>(as) is used to express the multipole/local expansion, while the 
original FMM uses the coefficients of the expansion terms. The advantage of Anderson's method over 



the original FMM is its simplicity. The translation operators described in section 2.2 are all realized by 
evaluating the potential on points of the target sphere due to the source sphere. All such evaluations are 
performed using equation (|ll|) and (|l2|). These are by far simpler to implement the original FMM using 
equation®-©. 



In order to evaluate the integral in equation (|ll|) and (12) numerically, we need to sample the values of 
$(as) on the surface of the sphere. The number of sample points required and the optimal positions of such 
points are given in Hardin and Sloanc 1996 . 
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4. Pseudo-Particle Multipole Method 

P 2 M 2 is quite similar to Anderson's method. The difference is that in P 2 M 2 we use the mass 
distribution on the surface of a sphere instead of the potential. The continuous mass distribution is again 
approximated by finite number of pseudo-particles, and the potential exerted by these pseudo-particles are 
calculated in the same way as that exerted by physical particles. 

Conceptually, physical particles are converted to pseudo-particles in the following two steps. First 
we calculate multipole expansion. Then we assign mass to pseudo-particles so that they have the same 
multipole expansion as physical particles. 

Calculation procedure of these two steps are as follows: In the first step we expand the potential with 



spherical harmonics. As we have seen in section 2.2, the multipole expansion and its coefficients are given 
by equation^) and (||). In the second step we find a continuous mass distribution p(a, 9, (f>) on a sphere of 
radius a, which approximate the potential field. Then we approximate that distribution by pseudo-particles. 
The mass distribution should satisfy 

4W+ 2 r 

«I" = J P(a, 0, 4>)Yr(0, <j>)da, (13) 

for < I < p. Here p is the highest order of multipole expansion to express and S denotes the surface of the 
unit sphere. Because the spherical harmonics comprise an orthonormal system, p(a, 9, <ft) is expressed as 

p 1 o/ + 1 

p(a, M) = £ £ I^^™ 1 ™ ( 14 ) 

(=0 m=-l 



Following Hardin and Sloane 1996, we place K points on the sphere so that they satisfy 



X}j=i fi r j)/K = Is /( as )^ s J where / is any polynomial of degree at most [2p + 1] (order [2p + 1] is 
necessary to guarantee orthogonality of spherical harmonics of up to order p), and Tj are the positions of 



the distributed points. Thus cquation(14) is replaced by 



f EE ^TYne^y (is) 

1=0 m=-l 



where rrij are the masses of pseudo-particles located at rj(a,9j,(f>j). In practice, the masses mj of 
pseudo-particles are calculated directly from the positions r^r^, 6i,4>i) and masses mi of physical particles. 
Combining equation^]) and (|l^), rrij is expressed as 



47T P 1 N 



m 

1=0 m=-l i=l 



Applying the addition theorem of spherical harmonics to equation(16), we obtain: 

AT 

21 + 1 /r,V 



E Wi E — k~ (~) P K C0S 7ij), (17) 



3 — 

i=l 1=0 



where 7^ is the angle between and Tj. 
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5. Accuracy of the Force Calculated with P 2 M 2 

Here we present the result of numerical tests for our implementation of treecode with P 2 M 2 . In section 



5.1 we compare the accuracy of P 2 M 2 and the original FMM for single particle. In section |5.2| we present 



the relation between the accuracy and the calculation cost for our implementation of treecode with P M 



5.1. Accuracy of the Force Exerted from One Particle 

We measured the accuracy of the potential calculated with P 2 M 2 and the original multipole expansion. 
A point mass located at p(l, 0, 0) in spherical coordinate is used as the source of the potential. In figure || 
the error of the potential evaluated at point q(q, 2ir/3, 0) is plotted as a function of q for both P 2 M 2 and 
the original multipole expansion. We can see that their agreement is quite good. 



5.2. Accuracy of the Total Force 

We measured the accuracy of the force calculated by the treecode with P 2 M 2 . We used 262144 
equal-mass particles uniformly distributed in a sphere. We measured the relative error e of the force 
averaged over all the particles, and the calculation cost c for p = 1, 2, and 4. The average error e is defined 

as 

"-n^-^TT' (18) 

where is the force calculated with treecode and f \ is the force calculated with direct summation method. 
We define the cost c as the average number of (pseudo)-particle-particle interactions evaluated for one 
particle. Figure H shows the results. We can see that P 2 M 2 version with high-order terms is faster than the 
original version with monopole term, when high accuracy (e < 10 ) is required. 



6. P 2 M 2 Treecode on GRAPE 



We implemented the treecode with P 2 M 2 on GRAPE (Makino and Taiji 1998; Sugimoto et al 
1990| ). GRAPE is a special-purpose computer for gravitational TV-body problem (figure ^). It consists of 



general-purpose host and special-purpose backend which calculates particle-particle interaction. 

With P 2 M 2 , high-order terms of multipole expansion are expressed as particles. Therefore we can 
implement FMM and treecode with arbitrary order of multipole expansion on GRAPE systems. Previously, 
we could use GRAPE only to evaluate monopole terms. 

Figure f| shows the timing results of our treecode. The CPU time per timestep is measured on DEC 



AlphaServer 8400 5/300 (DEC 21164, 300MHz, one processor) with and without GRAPE-4( Makino et al. 



1997). We used a uniform distribution of 262144 particles in a sphere. The calculation with GRAPE-4 is 
10-100 times faster than that without GRAPE-4. The gain becomes larger as we increase p or decrease 9. 
Thus we can conclude that the combination of P 2 M 2 and GRAPE is quite attractive if high accuracy is 
required. 
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Fig. 3. — The error of potential calculated with P 2 M 2 (solid) and original multipolc expansion (dashed). 
The error of potential generated by one particle at p are plotted against the distance from the origin to the 
evaluation point q. The error e is defined as the difference from the exact potential. From top to bottom, 
four curves are for p = 1, 2, 4 and 8, respectively. The sphere radius a is 1.0. 
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Fig. 4. — The error e plotted against the calculation cost c(top-left panel) and the opening angle ^(top-right 
panel). The relation between c and 9 is also shown in the bottom-left panel. Thick and thin curves denote 
treecode with and without P 2 M 2 , respectively. Solid, long dashed, and short dashed curves are for p = 1, 2, 
and 4, respectively. 
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Fig. 5.— The GRAPE system. 




Fig. 6. — Performance of P 2 M 2 treecode with and without GRAPE-4. The CPU time per one timestep is 
plotted for the opening angle 0(left panel) and the force calculation error e(right panel). Thick and thin 
curves are the results with and without GRAPE-4, respectively. Solid, long dashed, and short dashed curves 
are for p = 1, 2, and 4, respectively. 
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7. Relation between P 2 M 2 and Anderson's Method 

P 2 M 2 and Anderson's method are quite similar. Both approximate the multipole expansion by discrete 
values on points on a sphere. The difference is that P 2 M 2 assigns mass to points while Anderson's method 
assigns potential. 

P 2 M 2 has two advantages over Anderson's method. One is that the calculation of M2L part is 
significantly simpler for P 2 M 2 since the expanded potential $ is evaluated as the summation of the potential 
due to pseudo-particles. The expression of <& in P 2 M 2 is given by 

$ w=Ev' (19) 

while the expression for Anderson's method is given by equation(Jil"|). Since M2L is dominant part in FMM 
calculation, total calculation speed for the same expansion order would be faster for P 2 M 2 . As we have 
seen in section ||, P 2 M 2 has an additional advantage that it can make use of special-purpose computers to 
achieve further speed up. 

The other advantage is that the accuracy of the force calculated with P 2 M 2 is higher than that with 
Anderson's method for the same expansion order. In Anderson's method, the potential due to particles 
are converted to the potential on the sphere surface. The potential evaluated during this conversion is 
expressed as mt/r and not truncated to the p-ih order. Therefore "aliasing error" is introduced into the 
potential during this conversion. This error tends to degrade the accuracy. In P 2 M 2 , this aliasing error is 
completely suppressed. Of course, it is easy to modify Anderson's method to use the truncated form. Such 
a modification would significantly improve the overall accuracy. 

8. Summary 

In this paper, we present P 2 M 2 , a new method to express multipole expansion used in FMM and 
treecode. In P 2 M 2 , multipole expansion is expressed by pseudo-particles on a sphere. P 2 M 2 has two 
advantages over the original multipole expansion. One is its simplicity and the other is that it can effectively 
use special-purpose computers. We implemented treecode with P 2 M 2 and evaluated its performance with 
and without GRAPE-4. We found that GRAPE-4 accelerates the calculation by a factor of 10-100. We 
conclude that P 2 M 2 is a quite attractive method to implement FMM and treecode. 
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